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Abstract: Many statistical models are algebraic in that they are defined by poly- 
nomial constraints or by parameterizations that are polynomial or rational maps. 
This opens the door for tools from computational algebraic geometry. These tools 
can be employed to solve equation systems arising in maximum likelihood estima- 
tion and parameter identification, but they also permit to study model singularities 
at which standard asymptotic approximations to the distribution of estimators and 
test statistics may no longer be valid. This paper demonstrates such applications of 
algebraic geometry in selected examples of Gaussian models, thereby complement- 
ing the existing literature on models for discrete variables. 
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1 Introduction 

Algebraic statistics applies algebraic geometry to gain insight in structure and prop- 
erties of statistical models, and to tackle computational problems arising in tasks 
of statistical inference. Work in this field has addressed, for example, exact tests in 
contingency tables, experimental design, phylogenetic trees, maximum likelihood 
estimation under multinomial sampling, and Bayesian networks; cf. IHlEj- Alge- 
braic geometry typically enters the playing field in one of two ways. On one hand, 
statistical models are sometimes derived from a simple saturated model by imposing 
constraints. These constraints may, in particular, be motivated by considerations 
of (conditional) independence, stationarity or homogeneity. If the constraints are 
polynomial constraints on the parameters of the saturated model, then the model 
corresponds to the intersection of an algebraic variety and the saturated parameter 
space. An algebraic variety is the solution set of a system of polynomial equations. 
On the other hand, many statistical models are defined via a parameterization 
rather than via constraints. However, if this parameterization is a polynomial, or 
more generally a rational map, then the model, which can be identified with the 
image of the parameterization map, is naturally embedded in an algebraic variety. 
This algebraic description of the model is often useful because it can reveal insights 
about the model that are not as readily obtained from the parameterization alone. 

Virtually all work in algebraic statistics considers purely discrete variables; see 
[H] for an exception. The sample space is then finite, and the objects of interest are 
algebraic varieties over a probability simplex. However, the philosophy of algebraic 
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statistics applies, regardless of the distributional setting, whenever a statistical 
model is an "algebraic" submodel of some natural supermodel. A particularly 
interesting case occurs when the supermodel is a regular exponential family, because 
algebraic submodels may inherit desirable statistical properties at points at which 
the submodel's local geometry is sufficiently "regular." 

Discussing simple problems from parameter identification and likelihood ratio 
testing, this paper demonstrates algebraic techniques for Gaussian models, i.e., 
families of (non-singular) multivariate normal distributions. Section [21 reviews the 
normal distribution and introduces the algebraic point of view. Section treats 
the problem of identification of a graphical model with hidden variables. Section 
01 is devoted to model singularities, at which standard x 2 -approximations to the 
distribution of the likelihood ratio test statistic may no longer be valid. 



2 Algebraic Gaussian models 

Let Mpj P and R p * d p be the cones of positive definite and positive semi-definite sym- 
metric p x p-matrices, respectively. The multivariate normal distribution A^,(/x, E) 
with mean vector p = (/xi, . . . G MP and covariance matrix E = (<7y) G ^pd P 
is the probability distribution on MP that has Lebesgue density function 



fu e(x) = — ; exp 1 — — (x — u)*E 1 (x ~ u) \ , 

^ V ' ^/(2tt)p det(E) 1 2 l w V W J 



x g 



A Gaussian (statistical) model with mean parameter space M C W x K.p d p is the 
family of multivariate normal distributions {Af p (fi,T,) | (p, E) € M}. 

Proposition 2.1 ( Q. p. 194]). The saturated Gaussian model, that is, the family 
of all multivariate normal distributions on R p , which has mean parameter space 
M = MP x M.pj P , forms a regular exponential family with sufficient statistics x G M p 
and xx l G Rp^f- The natural parameters are E _1 /i G M p and E -1 G Rp^ p - 

Statistical modelling in the Gaussian framework involves hypotheses about struc- 
tural relationships among the components of the mean parameters p. and E. In 
many interesting cases, such a relationship comes from a parameterization. If this 
parameterization is rational, as detailed in the following definition, then the re- 
sulting model can be studied taking an algebraic point of view. Recall that a set 
O C M d is semi-algebraic if it is a union of sets of points satisfying polynomial 
equalities and inequalities; compare Chapter 2 in [2]. 

Definition 2.2. A Gaussian model is a parametric algebraic model if its mean 
parameter space M = f(8), where C M d is an open semi-algebraic set, and 

f : 9 -» R p x R p * p 

fi(0),...,f?-(0), p(0),... 
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is a rational map defined everywhere on O. In other words, the functions gk, hk, gij 
and hij are polynomial functions such that g" hk{&) for all k G [p] := {1, . . . ,p} 
and £ %(6) for all (i, j) G [p] 2 . 

Not all statistical models of interest are specified in terms of a parameterization; 
instead the model may be specified implicitly in the form of constraints on the mean 
parameters [i and E. One important class of constraints arises from conditional 
independence, which in the multivariate normal distribution corresponds to well- 
known polynomial conditions on the covariance matrix E. 

Proposition 2.3. Let X be a random vector in W that follows a multivariate 
normal distribution J\f p ((i, E) , in symbols, X ~ A/" p (/i,E). For three pairwise disjoint 
index sets A.B.CQ [p] :— {1, . . . ,p}, it holds that 

X A ALX B \X C det(E {i}uCx{j}uC ) = Vt G A, j G B. 

Here, Xa-^-Xb \ Xc denotes conditional independence of Xa and Xb given Xc, 
and Xa-^-Xb \ X$ denotes marginal independence of Xa and Xb- 

The fact that conditional independence is an algebraic condition motivates the 
next definition, in which 

M[/ifc,o-y | i,j,k G [p], i < j] 

denotes the ring of polynomials in the entries u,k and of the mean vector u, and 
the covariance matrix E. 

Definition 2.4. A Gaussian model is an implicit algebraic model if its mean pa- 
rameter space M is equal to the intersection of an algebraic variety V and the 
Cartesian product W x Kp^ p . In other words, there exist polynomials fx,..., ft G 
°ij I i)3> & G [p], i < j] such that 

M = {(/x, E) G M p x | fi((i, E) = • • • = / t ( M) E) = 0} . 

The next fact is a consequence of the Tarski-Seidenberg Theorem |3 Thm. 2.3.4]. 

Proposition 2.5. 27ie mean parameter space of an algebraic Gaussian model, para- 
metric or implicit, is a semi-algebraic set. 

An immediate consequence of Proposition 12.51 is that an algebraic Gaussian 
model always has a well-defined dimension, namely, the dimension dim(M) of 
the semi-algebraic mean parameter space [2. Def. 2.5.3]. In the parametric case, 
dim(Af) = dim(f (0)) can be determined as the maximal rank of the Jacobian of the 
rational parameterization map f . In the implicit case, dim(M) can be computed 
based on Grobner basis techniques |BI Thm. 3.7]. An implicit algebraic model need 
not be a parametric algebraic model, and vice versa. Nevertheless, a technique 
known as implicitization, cf. 3, §3.3] and §3.2], permits to find a unique small- 
est implicit model whose mean parameter space M contains the mean parameter 
space f(O) of a given parametric model while satisfying dim(Af) = dim(f(8)). 
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Figure 1: Acyclic digraph for a hidden variable model. 

3 Identifiability 

When specifying a parametric statistical model, one of the first concerns is whether 
the model is identifiable, that is, whether the parameters uniquely specify proba- 
bility distributions in the model. 

Definition 3.1. Consider a parametric Gaussian model with mean parameter space 
M = f(0) given as the image of a map f : 6 — * M. p x K.p d p - The model is 

(i) globally identifiable at 9 e if f _1 (f(0 o )) = {# }; 

(ii) locally identifiable at 8q G if there exists a ball B £ (9q) with center 9q and 
radius e > such that f- 1 (f(6» )) H B e (6 ) = {8 }. 

If the model is globally identifiable at all points in 0, that is, if the map f : G — > M 
is a bijection, then we say that the model is identifiable. 

For parametric algebraic models, global and local identifiability at a given point 
9q G © can be investigated by studying whether a system of polynomial equations 
deduced from the possibly rational equation system f(#o) = f(#) nas a (locally) 
unique solution 6 € O. We illustrate this in the following example. 

Consider the directed graphical Gaussian model with one hidden variable de- 
picted in Figure H which shows the relationship between p — 4 observed vari- 
ables (shaded nodes) and one hidden variable H . For simplicity we consider the 
model comprising only centered distributions. This model is a parametric alge- 
braic model with mean parameter space M — f(0) given by a polynomial map. 
The points in the parameterization domain = K 4 x (0, oo) 4 are vectors 9 = 
{0i,02, (33,04:, vi,u>2, u>3,u}4). Here, the four regression coefficients 0i £ K appear 
in conditional means, namely K[H | Xi,X2] = 0iX\ + 02X2 and E[X; | H] = 0iH 
if i = 3, 4. The variances uji > are either marginal or conditional variances, 
oji = Va,r[Xi] for i = 1,2, and u>i — Var[X; | H] for i = 3,4. The parameterization 
map is 

f : -> R 4 x M 4 * 4 
6^[0,fx{9)], 
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where fs (9) is the symmetric covariance matrix 

fui PzPx^x PaPx^i ^ 

U) 2 PzPl^% PiPi^l 

uj 3 + 0&{fi\ui + P 2 2 co 2 + 1) /3 4 /3 3 (/3i 2 wi + P^ 2 + 1) 
\ ^ 4 + /3|(/3 1 V + /3 2 2 W2 + l)/ 



(1) 



Note that we set the conditional variance Var[_ff | X\,X^\ = 1 because the image of 
the parameterization map remains unchanged if a free parameter for this variance 
is introduced. For details on such parameterizations see e.g. ^3 §8]. 

Is this parametric hidden variable model globally (or locally) identifiable at #0 € 
0? We can answer this question by studying the system of equations fs(6>o) = fe(^) 
in which the components Pio and Wjo of 9q are fixed numbers and the components 
Pi and u>i of 9 are indeterminants. From QJ, it is apparent that if f (9) = f(#o) then 
lj\ — wio and oj 2 = lo 2 q. Additional consequences can be worked out by hand, but 
we can also let the computer do this for us. 

Running the code in Table H m the computer algebra system Singular [J] 
informs us that the model is not globally identifiable at generic 9q & & because the 
solution set f _1 (f (9q)) generally contains mult(j) = 2 isolated points (dim(j) = 0). 
The computed Grobner basis (see |S] for the relevant background) 

> J; 

J[l]=w4+(-w40) 

J[2]=w3+(-w30) 

J [3] =w2+(-w20) 

J [4] =wl+(-wl0) 

J [5] = (b40) *b3+ (-b30) *b4 

J [6] = (-b40) *b2+ (b20) *b4 

J [7] = (-b40) *bl+ (blO) *b4 

J [8] =b4~2+(-b4CT2) 

suggests that for /3 40 7^ 0, it holds that 

f _1 (f (9 Q )) = {(Pxo, p 40 , wio, • ■ ■ , u 4Q y, (-P w , -Paq, wio, ■ ■ • , ^40)'}. (2) 

However, in the Grobner basis computation simplifications are made that are only 
valid if blO, . . . ,w40 are generic. In other words, during the computation a (finite) 
number of polynomial expressions in blO, . . . ,w40 are assumed to be non-zero. So 
while we can conclude that © holds for almost every 9q G 9, it may and does 
fail at certain points in the parameter domain O. For example, J2J) does not hold 
if P40 = 0, in which case it is possible that dim(J) e {1 7 2} indicating failure of 
local identifiability. In conclusion, the computation shows that the model is locally 
identifiable at almost every 9q € O. In more complicated models, computations 
treating 9q as symbolic quantity may become prohibitive. However, solving the 
system f(#o) = f(0) for a particular numeric vector 9q may still be feasible and 
informative. 
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LIB "linalg.lib"; option(redSB) ; 
ring R = (0,bl0,b20,b30,b40,wl0,w20,w30,w40) , 
(bl,b2,b3,b4,wl,w2,w3,w4) ,dp; 

// bl w4 are indeterminants ; bl0,...,w40 are symbolic parameters 

matrix B[5][5] = 1,0,0,0,0, 

0,1,0,0,0, 

0,0,l,0,-b3, 

0,0,0,l,-b4, 

-bl,-b2, 0,0,1; 
matrix W[5] [5] = wl, 0,0, 0,0, 

0,w2, 0,0,0, 

0,0,w3,0,0, 

0,0,0,w4,0, 

0,0,0,0,1; 
matrix BO [5] [5] = 1,0,0,0,0, 

0,1,0,0,0, 

0,0,l,0,-b30, 

0,0,0,l,-b40, 

-bl0,-b20, 0,0,1; 
matrix WO [5] [5] = wl0,0,0,0,0, 

0,w20, 0,0,0, 

0,0,w30,0,0, 

0,0,0,w40,0, 

0,0,0,0,1; 

matrix f [4] [4] = submat (inverse (B) *W*inverse (transpose (B) ), 1 .. 4, 1 .. 4) ; 
matrix f0[4][4] = submat (inverse(BO) *W0*inverse (transpose (B0) ) , 1 . .4, 1 . .4) ; 
ideal 1=0; int i,j; 

for(i=l; i<=4; i++){ for(j=i; j<=4; j++){ 

1=1+ ideal(f [i , j] -f [i , j] ) ; // identif iability equations 

} } 

ideal J = std(I) ; // Groebner basis for ideal I 

dim( J) ; mult ( J) ; 

Table 1: Code from session in computer algebra system Singular. 



4 Singularities of Gaussian models 

The saturated Gaussian model is a regular exponential family (Proposition I2.1fl . 
Therefore, under "regularity" conditions, results about asymptotic distributions of 
MLE and likelihood ratio test statistic can be transfered to submodels. Suppose 
the submodel is an algebraic model with mean parameter space M containing the 
true distribution Af(p,Q, So). If M is a smooth manifold, in which case the submodel 
is a curved exponential family, then regardless of where the true parameter (/io, So) 
is located, the MLE is asymptotically normal as the sample size tends to infinity. 
Moreover, the likelihood ratio test statistic for testing the submodel against the 
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Figure 2: The parameter spaces of two simple algebraic Gaussian models. 

saturated model has an asymptotic ^-distribution with degrees of freedom equal 
to the codimension of M, that is, the difference between the dimension of the 
saturated mean parameter space and dim(M). 

These standard results need no longer be true if one leaves the realm of curved 
exponential families. For example, if inequality constraints are imposed on the 
mean parameter space of a curved exponential family, boundary effects may be 
created. More subtly, the regularity conditions may be violated at points that are 
"singularities" in the mean parameter space of an algebraic Gaussian model. For 
a rigorous definition of singularities of algebraic varieties, see e.g. [21 §3.2]. In the 
examples in this section the singularities are obvious and intuitive. However, this 
will not necessarily be the case in larger models, in which case computer algebra 
software is very helpful for locating singular points. In particular, the software 
Singular offers the command slocus for computation of singular loci. 

4.1 Simple bivariate examples under independence 

Issues with singularities can be illustrated nicely with bivariate normal distribu- 
tions. For a closed set C C R 2 , let M c = C x {I 2 } be the mean (and natural) 
parameter space of the model of all bivariate normal distributions with mean vector 
fi G C and covariance matrix S equal to the identity matrix I 2 G R 2x2 . In this case 
the MLE fi for the model with mean parameter space Mc is the point in C that 
is closest to the sample mean vector X in Euclidean distance. The likelihood ratio 
test statistic Ac for testing /j, G Mc versus ju G M R 2 is equal to the product of the 
sample size n and the squared Euclidean distance of X and C. 

Example 4.1 (Folium of Descartes). Let C — {/i G M 2 | /i 2 = /if + /1 2 }, which 
is a curve that can be parameterized as i(9) = [6 2 — 1,0(9"* — 1)]. The curve is 
shown in the left plot in Figure The algebraic model with mean parameter space 
Mc is not a curved exponential family due to the singularity at the point of self- 
intersection, which is fi — 0. The dashed lines ^2 = ±/ii in the plot are orthogonal 
to each other and indicate the tangent cone at fi = 0. If the true parameter point 
is fi = 0, then the asymptotic distribution of the likelihood ratio test statistic Ac is 
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given by the squared Euclidean distance between a draw from A/"(0, 7 2 ) and the two 
orthogonal lines. This asymptotic distribution is the distribution of the minimum 
of two independent ^-random variables. □ 

Example 4.2 (Neil's parabola). Let C — {[J, € R 2 | /i| = fif} be the curve with 
parameterization f((9) = (9 2 , 9 3 ), which is shown in the right-most picture of Figure 
121 The algebraic model with mean parameter space Mc is again not a curved ex- 
ponential family due to the singularity at the cusp point fi = 0. For true parameter 
point 11 = 0, the likelihood ratio test statistic Ac has an asymptotic distribution 
that is the mixture of a Xi~ an d a ^-distribution. This mixture distribution is the 
distribution of the squared Euclidean distance between a draw from J\f(0, I2) and 
the (dashed) half-ray {ri | ^ > 0,^2 = 0}. □ 

Examples 14.11 and 14.21 demonstrate non-standard asymptotics at model singular- 
ities. At regular points in the respective mean parameter spaces the usual asymp- 
totics apply. However, if the true parameter forms a regular point that is close to 
the singular locus then a very large sample size may be required in order for the 
usual asymptotics to provide good approximations to the distributions of estimators 
and test statistics. 

4.2 A conditional independence model with singularities 

Many conditional independence models, in particular graphical models, form curved 
exponential families. However, singularities may arise from combining arbitrary 
independence constraints. Consider, for example, the model of trivariate normal 
distributions under which a random vector satisfies X1JLX2 and simultaneously 
X1ALX2 I X3. By Proposition 12.31 the model is an implicit algebraic model with 
mean parameter space 

M = {(/i, S) G M 3 x R^xa J ai2 = Q) de t(E {1 3 }><{2i 3 } ) = a 12 o- 33 - a 13 a 23 = o} 

The set M is defined by the joint vanishing of the two polynomials fx — o\2 and 
H = ci3<723- We see that 

M = M13 U M 23 := {(/x, S) e M | a 12 = cr 13 = 0} U {(/i, E) G M | <t 12 = a 23 = 0}. 

This reflects the well-known fact that 

[X^X 2 A X^X 2 \X 3 ] ^ [X 1 JL(X2,X 3 ) V X 2 MXi,X 3 )\, 

which also holds for distributions other than the multivariate normal; compare 01 
Thm. 8.3]. The singular locus of M is the intersection 

M sing = M13 n M 23 = {O, S) G M I <7 12 = a 13 = a 23 = 0}, 

which corresponds to diagonal S, i.e., complete independence XiALX2-ii-X 3 . 
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The likelihood ratio test statistic for testing the model with mean parameter 
space M against the saturated model can be expressed as 

x ■ J, ( s ll det (-5'{2,3}x{2,3}) ^ . ( S22 det(5 { i, 3} x { i, 3 } ) \1 . . 

X = n ■ mm |log _ j , log ^ j j . (3) 

If (/i, E) € M \ Mging, then A converges to a xi-distribution for n — ► oo; over the 
singular locus the limiting distribution is non-standard. 

Proposition 4.3. Let (p,, E) € M s i ng , and Zet W\2, Wis, W23 be three independent 
xi-random variables. As n — > 00, </ie likelihood ratio test statistic X converges to 
the minimum of two dependent X2~ distributed random variables, namely, 

A — > d min(VKi2 + W13, W 12 + W23) = W 12 + mm(W 13 , W 2 3)- 

Proof. For i G {1, 2, 3} and A C {1, 2, 3}, let 

Sii.A = s ii — S'{i}x J 4'S' J 4xA'^'-4x{i}' 

The likelihood ratio test statistic can be rewritten as 

A = n log ( 511522 2 ) + n • min (log («»*-) , log } . (4) 

\SnS22 ~ Sf 2 j { \ s 33.12/ \ S 33.12/J 

Recall that 

Vn[(sil, S12, Sl3, S 2 2, S 2 3, S 33 )* - (an, CT12, CT13, CT22, C23, 0-33)*] — > d 

k°~jm + 0~im.O~jk)ij,km) ■ (5) 

Since (//, E) € M S i ng implies that E is diagonal, the covariance matrix of the normal 
distribution in (0, known as the Isserlis matrix of E, is diagonal. Using an expan- 
sion up to the Hessian in the delta-method |1 II §3.3], we can show that the three 
logarithmic terms in Q converge to three independent x^-random variables. □ 

5 Conclusion 

The goal of this paper was to demonstrate the usefulness of algebraic geometry 
for studying properties of statistical (Gaussian) models. In order to keep intuition 
alive, the examples in this paper were chosen to be rather simple, but algebraic 
geometry can also provide useful insights in larger, less tractable models. 

Two particular problems were visited in this paper. First, parameter identi- 
fiability often gives rise to polynomial equation systems, the structure of which 
becomes more transparent when the equations are presented in Grobner basis form 
(Section Second, model singularities can result into non-standard asymptotics 
(Section EJ. Locating singularities and working out the associated asymptotics are 
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the first steps towards solving the challenging problem of divising sensible statis- 
tical procedures for models with singularities. Finally, we remark that methods 
combining Grobner basis techniques with numerical solving can also be used to 
compute all solutions to interesting likelihood equations, compare e.g. j^j. 
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